The select system is utilized to listen for readable, writable, and exception events on file descriptors of interest to the user for a specified period of time.
Why is there a select model?
Look at the following code, which is commonly seen in the socket programming:
int iResult = recv(sock, buffer, 1024);
This is used to receive data. In the socket in the default blocking mode, recv will block until there is readable data on the socket connection. The recv function will return after reading the data into the buffer. Otherwise, it will always be blocked there. This occurs in a single-threaded program, which will cause the main thread (there is only one default main thread in a single-threaded program) to be blocked, so that the entire program is blocked there, if no data is ever sent, which is not expected. What we expect is that the rest of the program can be executed The program should not be blocked on the I/O operation (recv is part of I/O operations).
This issue can be resolved by multithreading, but in the case of multiple socket connections, this is not a good choice and the scalability is poor.
Let’s check another code:
1
2
int iResult = ioctlsocket(sock, FIOBIO, (unsigned long * ) & ul);
iResult = recv(sock, buffer, 1024);
This time the recv call returns immediately regardless of whether there is any data on the socket connection that can be received. The reason is that we use ioctlsocket to set the socket to non-blocking mode. However, if you follow it, you will find that recv did return immediately without data, but also returned an error: WSAEWOULDBLOCK, which means that the requested operation was not completed successfully.
Seeing that you might think we can repeatedly call recv and check the return value until success, but this is very problematic and costly. We should avoid periodically checking.
The select model is to solve the above problems.
The key of the select model is to use an orderly way to uniformly manage and schedule multiple sockets. Check the following sequence diagram of the select model.
As shown above, the user first adds the socket that needs I/O operation to select, and then waits twice for the select system call to return. When the data arrives, the socket is activated and the select function returns. The user thread officially initiates a read request, reads the data, and continues execution.
From the process, using the select function for I/O requests is not much different from the synchronization model. It even adds additional listening sockets and additional operations that call the select function. However, the biggest advantage after using select is that users can handle multiple socket I/O requests simultaneously within a single thread. The user can register multiple sockets, and then constantly call and select to read the activated sockets, so as to achieve the purpose of processing multiple I/O requests simultaneously within the same thread.
In the synchronous two-way model, this must be achieved through multithreading.
The select process pseudo code is as follows (just to exemplify the process of the select model):
1
2
3
4
5
6
7
8
9
10
11
12
{
select(socket);
while (1) {
sockets = select();
for (socket in sockets) {
if (can_read(socket)) {
read(socket, buffer);
process(buffer);
}
}
}
}
Related APIs of the select model
1
2
3
4
5
6
7
8
9
#include <sys/types.h>
#include <sys/time.h>
#include <sys/select.h>
#include <unistd.h>
int select(int maxfdp, fd_set * readset, fd_set * writeset, fd_set * exceptset, struct timeval * timeout);
Here are the parameter descriptions:
maxfdp: the total number of file descriptors being monitored, which is one greater than the maximum value of file descriptors in all file descriptor sets, because file descriptors are counted from 0;
readfds, writefds, exceptset: point to the set of descriptors corresponding to readable, writable, and exception events, respectively.
timeout: used to set the timeout of the select function, that is, tell the kernel how long to wait at maximum. timeout == NULL means wait for infinite time.
The type of timeout is the timeval structure. The structure is defined as follows:
1
2
3
4
struct timeval {
long tv_sec; /* second */
long tv_usec; /* microsecond */
};
Return value of select:
0 when timeout;
-1 on failure;
success returns an integer greater than 0, this integer represents the number of ready descriptors.
Here are some common macros related to the select function:
1
2
3
4
5
6
#include <sys/select.h>
int FD_ZERO(int fd, fd_set * fdset); // All bits of a fd_set variable are set to 0
int FD_CLR(int fd, fd_set * fdset); // Clear a bit of the file descriptor (fd)
int FD_SET(int fd, fd_set * fdset); // Set a bit of the file descriptor (fd)
int FD_ISSET(int fd, fd_set * fdset); // Test if a bit is set
When a file descriptor (fd) set is declared, all positions must be zeroed using FD_ZERO. Then set the bits corresponding to the descriptors we are interested in, as follows:
1
2
3
4
5
fd_set fdset;
int fd;
FD_ZERO( & fdset);
FD_SET(fd, & rset);
FD_SET(stdin, & rset);
Then the select function is called, block to wait for the file descriptor event’s arrival; if it exceeds the set time, it does not wait any longer and continues to execute.
select(fd, &fdset, NULL, NULL, NULL);
After select returns, use FD_ISSET to test whether the positioning is set:
1
2
3
4
if (FD_ISSET(fd, & fdset) {
...
// Do something
}
Here is a simple full code to show how to use select.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
#include <sys/select.h>
#include <sys/time.h>
#include <sys/types.h>
#include <unistd.h>
#include <stdio.h>
int main() {
fd_set rd;
struct timeval tv;
int err;
FD_ZERO( & rd);
FD_SET(0, & rd);
tv.tv_sec = 5;
tv.tv_usec = 0;
err = select(1, & rd, NULL, NULL, & tv);
if (err == 0) // Timeout
{
printf("select timeout!\n");
} else if (err == -1) // Failure
{
printf("fail to select!\n");
} else // Successful
{
printf("data is available!\n");
}
return 0;
}
If we run the program and enter some data, the program prompts to receive the data.
If we just run the program and do nothing, wait for 5 seconds, it prompts the timeout message.
The select model is the most common I/O management. By calling the select function, the application can determine whether the data is ready and whether data can be written. Then the application doesn’t have to block there until the I/O operation is complete.
From the example in the previous section, we can see that the select model requires a fd_set. It means the select mode provides the ability to wait for multiple I/O operations.
However, the select model has some disadvantages. For example:
Each time we call select, we need to copy the fd_set from user mode to kernel mode. This overhead is very large when there are many file descriptors (fd).
At the same time, each call to select needs to traverse all fd passed in the kernel. This overhead is also very large when there are many file descriptors (fd).
The number of file descriptors supported by select is too small, the default number is 1024.
In the next article, I will dive into the select model, so that we can understand the advantages and disadvantages of the select model in more detail. I will also introduce the more advanced models like poll and epoll in the future.